17 research outputs found
Scoring rules in survival analysis
Scoring rules promote rational and good decision making and predictions by
models, this is increasingly important for automated procedures of `auto-ML'.
The Brier score and Log loss are well-established scoring rules for
classification and regression and possess the `strict properness' property that
encourages optimal predictions. In this paper we survey proposed scoring rules
for survival analysis, establish the first clear definition of `(strict)
properness' for survival scoring rules, and determine which losses are proper
and improper. We prove that commonly utilised scoring rules that are claimed to
be proper are in fact improper. We further prove that under a strict set of
assumptions a class of scoring rules is strictly proper for, what we term,
`approximate' survival losses. We hope these findings encourage further
research into robust validation of survival models and promote honest
evaluation
NIPS - Not Even Wrong? A Systematic Review of Empirically Complete Demonstrations of Algorithmic Effectiveness in the Machine Learning and Artificial Intelligence Literature
Objective: To determine the completeness of argumentative steps necessary to
conclude effectiveness of an algorithm in a sample of current ML/AI supervised
learning literature.
Data Sources: Papers published in the Neural Information Processing Systems
(NeurIPS, n\'ee NIPS) journal where the official record showed a 2017 year of
publication.
Eligibility Criteria: Studies reporting a (semi-)supervised model, or
pre-processing fused with (semi-)supervised models for tabular data.
Study Appraisal: Three reviewers applied the assessment criteria to determine
argumentative completeness. The criteria were split into three groups,
including: experiments (e.g real and/or synthetic data), baselines (e.g
uninformed and/or state-of-art) and quantitative comparison (e.g. performance
quantifiers with confidence intervals and formal comparison of the algorithm
against baselines).
Results: Of the 121 eligible manuscripts (from the sample of 679 abstracts),
99\% used real-world data and 29\% used synthetic data. 91\% of manuscripts did
not report an uninformed baseline and 55\% reported a state-of-art baseline.
32\% reported confidence intervals for performance but none provided references
or exposition for how these were calculated. 3\% reported formal comparisons.
Limitations: The use of one journal as the primary information source may not
be representative of all ML/AI literature. However, the NeurIPS conference is
recognised to be amongst the top tier concerning ML/AI studies, so it is
reasonable to consider its corpus to be representative of high-quality
research.
Conclusion: Using the 2017 sample of the NeurIPS supervised learning corpus
as an indicator for the quality and trustworthiness of current ML/AI research,
it appears that complete argumentative chains in demonstrations of algorithmic
effectiveness are rare
A theoretical and methodological framework for machine learning in survival analysis: Enabling transparent and accessible predictive modelling on right-censored time-to-event data
Survival analysis is an important field of Statistics concerned with mak- ing time-to-event predictions with ‘censored’ data. Machine learning, specifically supervised learning, is the field of Statistics concerned with using state-of-the-art algorithms in order to make predictions on unseen data. This thesis looks at unifying these two fields as current research into the two is still disjoint, with ‘classical survival’ on one side and su- pervised learning (primarily classification and regression) on the other. This PhD aims to improve the quality of machine learning research in survival analysis by focusing on transparency, accessibility, and predic- tive performance in model building and evaluation. This is achieved by examining historic and current proposals and implementations for models and measures (both classical and machine learning) in survival analysis and making novel contributions. In particular this includes: i) a survey of survival models including a crit- ical and technical survey of almost all supervised learning model classes currently utilised in survival, as well as novel adaptations; ii) a survey of evaluation measures for survival models, including key definitions, proofs and theorems for survival scoring rules that had previously been missing from the literature; iii) introduction and formalisation of composition and reduction in survival analysis, with a view on increasing transparency of modelling strategies and improving predictive performance; iv) imple- mentation of several R software packages, in particular mlr3proba for machine learning in survival analysis; and v) the first large-scale bench- mark experiment on right-censored time-to-event data with 24 survival models and 66 datasets. Survival analysis has many important applications in medical statistics, engineering and finance, and as such requires the same level of rigour as other machine learning fields such as regression and classification; this thesis aims to make this clear by describing a framework from prediction and evaluation to implementation
distr6: R6 Object-Oriented Probability Distributions Interface in R
distr6 is an object-oriented (OO) probability distributions interface
leveraging the extensibility and scalability of R6, and the speed and
efficiency of Rcpp. Over 50 probability distributions are currently implemented
in the package with `core' methods including density, distribution, and
generating functions, and more `exotic' ones including hazards and distribution
function anti-derivatives. In addition to simple distributions, distr6 supports
compositions such as truncation, mixtures, and product distributions. This
paper presents the core functionality of the package and demonstrates examples
for key use-cases. In addition this paper provides a critical review of the
object-oriented programming paradigms in R and describes some novel
implementations for design patterns and core object-oriented features
introduced by the package for supporting distr6 components.Comment: Accepted in The R Journa
Deep Learning for Survival Analysis: A Review
The influx of deep learning (DL) techniques into the field of survival
analysis in recent years, coupled with the increasing availability of
high-dimensional omics data and unstructured data like images or text, has led
to substantial methodological progress; for instance, learning from such
high-dimensional or unstructured data. Numerous modern DL-based survival
methods have been developed since the mid-2010s; however, they often address
only a small subset of scenarios in the time-to-event data setting - e.g.,
single-risk right-censored survival tasks - and neglect to incorporate more
complex (and common) settings. Partially, this is due to a lack of exchange
between experts in the respective fields.
In this work, we provide a comprehensive systematic review of DL-based
methods for time-to-event analysis, characterizing them according to both
survival- and DL-related attributes. In doing so, we hope to provide a helpful
overview to practitioners who are interested in DL techniques applicable to
their specific use case as well as to enable researchers from both fields to
identify directions for future investigation. We provide a detailed
characterization of the methods included in this review as an open-source,
interactive table: https://survival-org.github.io/DL4Survival. As this research
area is advancing rapidly, we encourage the research community to contribute to
keeping the information up to date.Comment: 24 pages, 6 figures, 2 tables, 1 interactive tabl
Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures
Motivation In this paper we consider how to evaluate survival distribution predictions with measures of discrimination. This is non-trivial as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages. Results Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons or ‘C-hacking’. We demonstrate by example how simple it can be to manipulate results and use this to argue for better reporting guidelines and transparency in the literature. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation. Availability The code used in the final experiment is available at https://github.com/RaphaelS1/distribution_discrimination. Supplementary information Supplementary data are available at Bioinformatics online
Flexible Group Fairness Metrics for Survival Analysis
Algorithmic fairness is an increasingly important field concerned with
detecting and mitigating biases in machine learning models. There has been a
wealth of literature for algorithmic fairness in regression and classification
however there has been little exploration of the field for survival analysis.
Survival analysis is the prediction task in which one attempts to predict the
probability of an event occurring over time. Survival predictions are
particularly important in sensitive settings such as when utilising machine
learning for diagnosis and prognosis of patients. In this paper we explore how
to utilise existing survival metrics to measure bias with group fairness
metrics. We explore this in an empirical experiment with 29 survival datasets
and 8 measures. We find that measures of discrimination are able to capture
bias well whereas there is less clarity with measures of calibration and
scoring rules. We suggest further areas for research including prediction-based
fairness metrics for distribution predictions.Comment: Accepted in DSHealth 2022 (Workshop on Applied Data Science for
Healthcare
Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil
Cases of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in Manaus, Brazil, resurged in late 2020 despite previously high levels of infection. Genome sequencing of viruses sampled in Manaus between November 2020 and January 2021 revealed the emergence and circulation of a novel SARS-CoV-2 variant of concern. Lineage P.1 acquired 17 mutations, including a trio in the spike protein (K417T, E484K, and N501Y) associated with increased binding to the human ACE2 (angiotensin-converting enzyme 2) receptor. Molecular clock analysis shows that P.1 emergence occurred around mid-November 2020 and was preceded by a period of faster molecular evolution. Using a two-category dynamical model that integrates genomic and mortality data, we estimate that P.1 may be 1.7- to 2.4-fold more transmissible and that previous (non-P.1) infection provides 54 to 79% of the protection against infection with P.1 that it provides against non-P.1 lineages. Enhanced global genomic surveillance of variants of concern, which may exhibit increased transmissibility and/or immune evasion, is critical to accelerate pandemic responsiveness
Avoiding C-hacking when evaluating survival distribution predictions with discrimination measures.
MOTIVATION: In this article, we consider how to evaluate survival distribution predictions with measures of discrimination. This is non-trivial as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages. RESULTS: Whilst distributions are frequently evaluated by discrimination measures, we find that the method for doing so is rarely described in the literature and often leads to unfair comparisons or 'C-hacking'. We demonstrate by example how simple it can be to manipulate results and use this to argue for better reporting guidelines and transparency in the literature. We recommend that machine learning survival analysis software implements clear transformations between distribution and risk predictions in order to allow more transparent and accessible model evaluation. AVAILABILITY AND IMPLEMENTATION: The code used in the final experiment is available at https://github.com/RaphaelS1/distribution_discrimination